cuDF を使ってGPUでDataFrameを処理してみよう
2017:
• Prototyped pygdf, a pandas-like GPU DataFrame • Prototyped dask_gdf, a distributed pygdf
• Ingest data from MapD into PyGDF and do ETL
• Export feature matrices to H2O GLM
2018
• Nvidia created RAPIDS
• pygdf evolved into cudf
• dask_gdf evolved into dask_cudf
https://gyazo.com/4f8884148fd7ef7457fd3cbfb212ba45
Use Arrow format to share data • in-memory
• columnar
• no-copy
• open-standard
• spark, hadoop, parquet, impala, etc..
Using Numba
• Use Numba for GPU functions and GPU arrays • JIT user-defined functions as GPU kernels
• Interactive CUDA programming in notebooks
https://gyazo.com/2486bb323a5b56355528d31fbb82e141
参考
RAPIDS - Open GPU Data Scienece